I'm working with bash_history file containing blocks with the following format: #unixtimestamp\ncommand\n
here's sample of the bash_history file:
#1713308636
cat > ./initramfs/init << "EOF"
#!/bin/sh
/bin/sh
EOF
#1713308642
file initramfs/init
#1713308686
cpio -v -t -F init.cpio
#1713308689
cpio -v -t -F init.cpio
#1713308690
ls
#1713308691
ls
My goal is to de-duplicate blocks entirely, meaning both the timestamp and the associated commands. I've attempted using awk, but this approach processes lines individually, not considering them as part of a block.
I've heard that using ignoredups prevents deduplication, but it won't work in this case (unless you retype the exact command) because the duplicate command is already there.
I'd appreciate suggestions on a more effective way to achieve this de-duplication.
EDIT: as suggested by Ed Morton on the comment, here's the expected output:
#1713308636
cat > ./initramfs/init << "EOF"
#!/bin/sh
/bin/sh
EOF
#1713308642
file initramfs/init
#1713308686
cpio -v -t -F init.cpio
#1713308690
ls
as a workaround, I add the delete functionality to this program. but I'm still open to other approaches that use existing commands.
#1234567890at your command prompt. Now type a here-document including that same string. Now update your example to show your history file contents including those lines and a few others, including some duplicates. That is to test if we can robustly use a regexp like^#[0-9]{10}$or similar as a delimiter between records. If either of those strings in your history file are formatted indistinguishably from your timestamps like#1713308636then it becomes a much harder problem to solve