Revisions to Python implementation of SHA1

make code portable between Python 2 and Python 3; add assert

Source Link

edited Jan 30, 2014 at 11:31

50.1k
3
130
211

There's no documentation. What does this code do and how am I supposed to use it?

There are no test cases. Since the hashlibhashlib module has a sha1 function, it would be easy to write a test case that generates some random data and hashes it with both algorithms. For example:

 import hashlib
 from random import randrange
 from unittest import TestCase

 class TestSha1(TestCase):
     def test_sha1(self):
         for l in range(129):
             data = ''.join(chrbytearray(randrange(256)) for _ in range(l))
             self.assertEqual(hashlib.sha1(data).hexdigest(), sha1(data))

You start by converting the message to a string containing its representation in binary. But then later on you convert it back to integers again. It would be simpler and faster to work with bytes throughout. Python provides the bytearraybytearray type for manipulating sequences of bytes and the structstruct module for interpreting sequences of bytes as binary data. So you could initialize and pad the message like this:
```
 message = bytearray(data)
 # Append a 1-bit.
 message.append(b'\x80'0x80)
 # Pad with zeroes until message length in bytes is 56 (mod 64).
 message.extend(b'\x00'[0] * (63 - (len(message) + 7) % 64))
 # Append the original length (big-endian, 64 bits).
 message.extend(struct.pack('>Q', len(data) * 8))
```

and then convert the chunks to 32-bit integers using struct.unpack like this:

    for chunk in range(0, len(message), 64):
        w = list(struct.unpack('>16L', message[chunk: chunk+64]))
        for i in range(16, 80):
            w.append(rol((w[i-3] ^ w[i-8] ^ w[i-14] ^ w[i-16]), 1))

I would write the conditions in the main loop as 0 <= i < 20, 20 <= i < 40 and so on, to match the half-open intervals that Python uses elsewhere (in list slices, range and so on).
I would replace elif 60 <= i <= 79: by else: to make it clear that this must match.:
```
 elif 60 <= i <= 79:
```

by

    else:
        assert 60 <= i < 80

to make it clear that this case must match.

You can avoid the need for temp by using a tuple assignment:

 a, b, c, d, e = ((rol(a, 5) + f + e + k + w[i]) & 0xffffffff, a, rol(b, 30), c, d)

There's no documentation. What does this code do and how am I supposed to use it?

There are no test cases. Since the hashlib module has a sha1 function, it would be easy to write a test case that generates some random data and hashes it with both algorithms. For example:

 import hashlib
 from random import randrange
 from unittest import TestCase

 class TestSha1(TestCase):
     def test_sha1(self):
         for l in range(129):
             data = ''.join(chr(randrange(256)) for _ in range(l))
             self.assertEqual(hashlib.sha1(data).hexdigest(), sha1(data))

You start by converting the message to a string containing its representation in binary. But then later on you convert it back to integers again. It would be simpler and faster to work with bytes throughout. Python provides the bytearray type for manipulating sequences of bytes and the struct module for interpreting sequences of bytes as binary data. So you could initialize and pad the message like this:
```
 message = bytearray(data)
 # Append a 1-bit.
 message.append(b'\x80')
 # Pad with zeroes until message length in bytes is 56 (mod 64).
 message.extend(b'\x00' * (63 - (len(message) + 7) % 64))
 # Append the original length (big-endian, 64 bits).
 message.extend(struct.pack('>Q', len(data) * 8))
```

and then convert the chunks to 32-bit integers like this:

    for chunk in range(0, len(message), 64):
        w = list(struct.unpack('>16L', message[chunk: chunk+64]))

I would write the conditions in the main loop as 0 <= i < 20, 20 <= i < 40 and so on, to match the half-open intervals that Python uses elsewhere (in list slices, range and so on).
I would replace elif 60 <= i <= 79: by else: to make it clear that this must match.

You can avoid the need for temp by using a tuple assignment:

 a, b, c, d, e = ((rol(a, 5) + f + e + k + w[i]) & 0xffffffff, a, rol(b, 30), c, d)

There's no documentation. What does this code do and how am I supposed to use it?

There are no test cases. Since the hashlib module has a sha1 function, it would be easy to write a test case that generates some random data and hashes it with both algorithms. For example:

 import hashlib
 from random import randrange
 from unittest import TestCase

 class TestSha1(TestCase):
     def test_sha1(self):
         for l in range(129):
             data = bytearray(randrange(256) for _ in range(l))
             self.assertEqual(hashlib.sha1(data).hexdigest(), sha1(data))

You start by converting the message to a string containing its representation in binary. But then later on you convert it back to integers again. It would be simpler and faster to work with bytes throughout. Python provides the bytearray type for manipulating sequences of bytes and the struct module for interpreting sequences of bytes as binary data. So you could initialize and pad the message like this:
```
 message = bytearray(data)
 # Append a 1-bit.
 message.append(0x80)
 # Pad with zeroes until message length in bytes is 56 (mod 64).
 message.extend([0] * (63 - (len(message) + 7) % 64))
 # Append the original length (big-endian, 64 bits).
 message.extend(struct.pack('>Q', len(data) * 8))
```

and then convert the chunks to 32-bit integers using struct.unpack like this:

    for chunk in range(0, len(message), 64):
        w = list(struct.unpack('>16L', message[chunk: chunk+64]))
        for i in range(16, 80):
            w.append(rol((w[i-3] ^ w[i-8] ^ w[i-14] ^ w[i-16]), 1))

I would write the conditions in the main loop as 0 <= i < 20, 20 <= i < 40 and so on, to match the half-open intervals that Python uses elsewhere (in list slices, range and so on).
I would replace:
```
 elif 60 <= i <= 79:
```

by

    else:
        assert 60 <= i < 80

to make it clear that this case must match.

You can avoid the need for temp by using a tuple assignment:

 a, b, c, d, e = ((rol(a, 5) + f + e + k + w[i]) & 0xffffffff, a, rol(b, 30), c, d)

use b'' for bytes

Source Link

edited Jan 28, 2014 at 12:49

Gareth Rees

50.1k
3
130
211

There's no documentation. What does this code do and how am I supposed to use it?

There are no test cases. Since the hashlib module has a sha1 function, it would be easy to write a test case that generates some random data and hashes it with both algorithms. For example:

 import hashlib
 from random import randrange
 from unittest import TestCase

 class TestSha1(TestCase):
     def test_sha1(self):
         for l in range(129):
             data = ''.join(chr(randrange(256)) for _ in range(l))
             self.assertEqual(hashlib.sha1(data).hexdigest(), sha1(data))

You start by converting the message to a string containing its representation in binary. But then later on you convert it back to integers again. It would be simpler and faster to work with bytes throughout. Python provides the bytearray type for manipulating sequences of bytes and the struct module for interpreting sequences of bytes as binary data. So you could initialize and pad the message like this:
```
 message = bytearray(data)
 # Append a 1-bit.
 message.append(0x80b'\x80')
 # Pad with zeroes until message length in bytes is 56 (mod 64).
 message.extend(0 for _b'\x00' in* range(63 - (len(message) + 7) % 64))
 # Append the original length (big-endian, 64 bits).
 message.extend(struct.pack('>Q', len(data) * 8))
```

and then convert the chunks to 32-bit integers like this:

    for chunk in range(0, len(message), 64):
        w = list(struct.unpack('>16L', message[chunk: chunk+64]))

This runs substantially faster than your code.

I would write the conditions in the main loop as 0 <= i < 20, 20 <= i < 40 and so on, to match the half-open intervals that Python uses elsewhere (in list slices, range and so on).
I would replace elif 60 <= i <= 79: by else: to make it clear that this must match.

You can avoid the need for temp by using a tuple assignment:

 a, b, c, d, e = ((rol(a, 5) + f + e + k + w[i]) & 0xffffffff, a, rol(b, 30), c, d)

There's no documentation. What does this code do and how am I supposed to use it?

There are no test cases. Since the hashlib module has a sha1 function, it would be easy to write a test case that generates some random data and hashes it with both algorithms. For example:

 import hashlib
 from random import randrange
 from unittest import TestCase

 class TestSha1(TestCase):
     def test_sha1(self):
         for l in range(129):
             data = ''.join(chr(randrange(256)) for _ in range(l))
             self.assertEqual(hashlib.sha1(data).hexdigest(), sha1(data))

You start by converting the message to a string containing its representation in binary. But then later on you convert it back to integers again. It would be simpler and faster to work with bytes throughout. Python provides the bytearray type for manipulating sequences of bytes and the struct module for interpreting sequences of bytes as binary data. So you could initialize and pad the message like this:
```
 message = bytearray(data)
 # Append a 1-bit.
 message.append(0x80)
 # Pad with zeroes until message length in bytes is 56 (mod 64).
 message.extend(0 for _ in range(63 - (len(message) + 7) % 64))
 # Append the original length (big-endian, 64 bits).
 message.extend(struct.pack('>Q', len(data) * 8))
```

and then convert the chunks to 32-bit integers like this:

    for chunk in range(0, len(message), 64):
        w = list(struct.unpack('>16L', message[chunk: chunk+64]))

This runs substantially faster than your code.

I would write the conditions in the main loop as 0 <= i < 20, 20 <= i < 40 and so on, to match the half-open intervals that Python uses elsewhere (in list slices, range and so on).
I would replace elif 60 <= i <= 79: by else: to make it clear that this must match.

You can avoid the need for temp by using a tuple assignment:

 a, b, c, d, e = ((rol(a, 5) + f + e + k + w[i]) & 0xffffffff, a, rol(b, 30), c, d)

There's no documentation. What does this code do and how am I supposed to use it?

There are no test cases. Since the hashlib module has a sha1 function, it would be easy to write a test case that generates some random data and hashes it with both algorithms. For example:

 import hashlib
 from random import randrange
 from unittest import TestCase

 class TestSha1(TestCase):
     def test_sha1(self):
         for l in range(129):
             data = ''.join(chr(randrange(256)) for _ in range(l))
             self.assertEqual(hashlib.sha1(data).hexdigest(), sha1(data))

You start by converting the message to a string containing its representation in binary. But then later on you convert it back to integers again. It would be simpler and faster to work with bytes throughout. Python provides the bytearray type for manipulating sequences of bytes and the struct module for interpreting sequences of bytes as binary data. So you could initialize and pad the message like this:
```
 message = bytearray(data)
 # Append a 1-bit.
 message.append(b'\x80')
 # Pad with zeroes until message length in bytes is 56 (mod 64).
 message.extend(b'\x00' * (63 - (len(message) + 7) % 64))
 # Append the original length (big-endian, 64 bits).
 message.extend(struct.pack('>Q', len(data) * 8))
```

and then convert the chunks to 32-bit integers like this:

    for chunk in range(0, len(message), 64):
        w = list(struct.unpack('>16L', message[chunk: chunk+64]))

This runs substantially faster than your code.

I would write the conditions in the main loop as 0 <= i < 20, 20 <= i < 40 and so on, to match the half-open intervals that Python uses elsewhere (in list slices, range and so on).
I would replace elif 60 <= i <= 79: by else: to make it clear that this must match.

You can avoid the need for temp by using a tuple assignment:

 a, b, c, d, e = ((rol(a, 5) + f + e + k + w[i]) & 0xffffffff, a, rol(b, 30), c, d)

Source Link

answered Dec 18, 2013 at 11:51

Gareth Rees

50.1k
3
130
211

There's no documentation. What does this code do and how am I supposed to use it?

There are no test cases. Since the hashlib module has a sha1 function, it would be easy to write a test case that generates some random data and hashes it with both algorithms. For example:

 import hashlib
 from random import randrange
 from unittest import TestCase

 class TestSha1(TestCase):
     def test_sha1(self):
         for l in range(129):
             data = ''.join(chr(randrange(256)) for _ in range(l))
             self.assertEqual(hashlib.sha1(data).hexdigest(), sha1(data))

You start by converting the message to a string containing its representation in binary. But then later on you convert it back to integers again. It would be simpler and faster to work with bytes throughout. Python provides the bytearray type for manipulating sequences of bytes and the struct module for interpreting sequences of bytes as binary data. So you could initialize and pad the message like this:
```
 message = bytearray(data)
 # Append a 1-bit.
 message.append(0x80)
 # Pad with zeroes until message length in bytes is 56 (mod 64).
 message.extend(0 for _ in range(63 - (len(message) + 7) % 64))
 # Append the original length (big-endian, 64 bits).
 message.extend(struct.pack('>Q', len(data) * 8))
```

and then convert the chunks to 32-bit integers like this:

    for chunk in range(0, len(message), 64):
        w = list(struct.unpack('>16L', message[chunk: chunk+64]))

This runs substantially faster than your code.

I would write the conditions in the main loop as 0 <= i < 20, 20 <= i < 40 and so on, to match the half-open intervals that Python uses elsewhere (in list slices, range and so on).
I would replace elif 60 <= i <= 79: by else: to make it clear that this must match.

You can avoid the need for temp by using a tuple assignment:

 a, b, c, d, e = ((rol(a, 5) + f + e + k + w[i]) & 0xffffffff, a, rol(b, 30), c, d)

Stack Exchange Network

Return to Answer