0

So I am trying to calculate a SHA1 digest (and base64 encode it) for an attachment I received so it matches their (base64 encoded) digest. Using ChatGPT, I was able to use the command below to succeed. But I don't understand the command xxd -r -p or why it's necessary.

sha1sum foo | awk '{print $1}' | xxd -r -p | base64

There's two things I don't understand about it:

  1. Why is the option -p needed? I understand xxd -r makes a binary dump of a hexdump, but if I leave off the -p it prints nothing while the man page says it should print to stdout and that -p is just for a particular output style.
  2. What is the purpose of even passing the SHA1 digest through xxd? My understanding is xxd essentially converts data between hex and binary. But when I run the above command without xxd -r -p, it gives a different, and unmatched, output. I do know what hex style/representation is (that's the format of a SHA1 digest), but perhaps I don't know what "binary" means in this context?
1
  • 1
    Actually, you only need openssl for that: openssl sha1 -binary foo | openssl base64 Commented Nov 28, 2023 at 1:15

1 Answer 1

3

First, look at the input to xxd -r p to find out what we're dealing with. Here's an example:

echo boo | sha1sum | awk '{print $1}'
6132b58967cf1ebc05062492c17145e5ee9f82a8

This line generates the SHA1 hash of the four characters, boo and its trailing newline.

Now let's investigate the xxd command a little. The man page for xxd (see man xxd) explains that the xxd -r command reverses a normal hexdump generated by xxd. So we can construct a pipeline like this:

echo boo | xxd | xxd -r
boo

Now contrast these two outputs:

echo boo | xxd
00000000: 626f 6f0a                                boo.

echo boo | xxd -p
626f6f0a

To reverse the second output into binary we need not only -r but also -p:

echo boo | xxd -p | xxd -r -p
boo

This structure (hex bytes without spaces or other formatting, generated by xxd -p) happens to match the output from sha1sum, which is why that can be converted to binary using this approach. The resulting binary is then (finally!) passed through base64, converting it to Base 64:

echo boo | sha1sum | awk '{print $1}' | xxd -r -p | base64
YTK1iWfPHrwFBiSSwXFF5e6fgqg=
7
  • Gotcha. Tinkered with it myself and the -p makes sense as in this case, it matches the format that sha1sum1 produces. But I still don't understand why the SHA1 digest is passed through xxd anyway? Is it because the output of sha1sum is actually in ASCII and not binary? As in the output of sha1sum is the same ASCII format as the output of echo boo? Commented Nov 27, 2023 at 20:25
  • 2
    I've no idea why you'd want to do that either. Happy to look at the actual problem (rather than your partial solution from ChatGPT that may or may not be accurate). Ask away - for example "Given this base64 encoded digest and these instructions that came with it, how do I validate my download" - and if you like, link it here Commented Nov 27, 2023 at 20:27
  • Sorry this took longer than I wanted. These are the two relevant lines from the Python2 program provided. Everything above it is about preparing the attachment I'm calculating the checksum for. sha1Hash = hashlib.sha1(encoded_attachment) attachmenthash = base64.encodestring(sha1Hash.digest()).strip() Commented Nov 27, 2023 at 21:32
  • Unfortunately python code is of little use to me, @geckels1 Commented Nov 27, 2023 at 23:02
  • Ah sry. Those are the only "instructions" I've been given. Yes, this web service is frustrating to use. But you have been very helpful. Thanks! Commented Nov 27, 2023 at 23:21

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.