Skip to main content
make code portable between Python 2 and Python 3; add assert
Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 211
  1. There's no documentation. What does this code do and how am I supposed to use it?

  2. There are no test cases. Since the hashlibhashlib module has a sha1 function, it would be easy to write a test case that generates some random data and hashes it with both algorithms. For example:

     import hashlib
     from random import randrange
     from unittest import TestCase
    
     class TestSha1(TestCase):
         def test_sha1(self):
             for l in range(129):
                 data = ''.join(chrbytearray(randrange(256)) for _ in range(l))
                 self.assertEqual(hashlib.sha1(data).hexdigest(), sha1(data))
    
  3. You start by converting the message to a string containing its representation in binary. But then later on you convert it back to integers again. It would be simpler and faster to work with bytes throughout. Python provides the bytearraybytearray type for manipulating sequences of bytes and the structstruct module for interpreting sequences of bytes as binary data. So you could initialize and pad the message like this:

     message = bytearray(data)
     # Append a 1-bit.
     message.append(b'\x80'0x80)
     # Pad with zeroes until message length in bytes is 56 (mod 64).
     message.extend(b'\x00'[0] * (63 - (len(message) + 7) % 64))
     # Append the original length (big-endian, 64 bits).
     message.extend(struct.pack('>Q', len(data) * 8))
    

and then convert the chunks to 32-bit integers using struct.unpack like this:

    for chunk in range(0, len(message), 64):
        w = list(struct.unpack('>16L', message[chunk: chunk+64]))
        for i in range(16, 80):
            w.append(rol((w[i-3] ^ w[i-8] ^ w[i-14] ^ w[i-16]), 1))
  1. I would write the conditions in the main loop as 0 <= i < 20, 20 <= i < 40 and so on, to match the half-open intervals that Python uses elsewhere (in list slices, range and so on).

  2. I would replace elif 60 <= i <= 79: by else: to make it clear that this must match.:

     elif 60 <= i <= 79:
    

by

    else:
        assert 60 <= i < 80

to make it clear that this case must match.

  1. You can avoid the need for temp by using a tuple assignment:

     a, b, c, d, e = ((rol(a, 5) + f + e + k + w[i]) & 0xffffffff, a, rol(b, 30), c, d)
    
  1. There's no documentation. What does this code do and how am I supposed to use it?

  2. There are no test cases. Since the hashlib module has a sha1 function, it would be easy to write a test case that generates some random data and hashes it with both algorithms. For example:

     import hashlib
     from random import randrange
     from unittest import TestCase
    
     class TestSha1(TestCase):
         def test_sha1(self):
             for l in range(129):
                 data = ''.join(chr(randrange(256)) for _ in range(l))
                 self.assertEqual(hashlib.sha1(data).hexdigest(), sha1(data))
    
  3. You start by converting the message to a string containing its representation in binary. But then later on you convert it back to integers again. It would be simpler and faster to work with bytes throughout. Python provides the bytearray type for manipulating sequences of bytes and the struct module for interpreting sequences of bytes as binary data. So you could initialize and pad the message like this:

     message = bytearray(data)
     # Append a 1-bit.
     message.append(b'\x80')
     # Pad with zeroes until message length in bytes is 56 (mod 64).
     message.extend(b'\x00' * (63 - (len(message) + 7) % 64))
     # Append the original length (big-endian, 64 bits).
     message.extend(struct.pack('>Q', len(data) * 8))
    

and then convert the chunks to 32-bit integers like this:

    for chunk in range(0, len(message), 64):
        w = list(struct.unpack('>16L', message[chunk: chunk+64]))
  1. I would write the conditions in the main loop as 0 <= i < 20, 20 <= i < 40 and so on, to match the half-open intervals that Python uses elsewhere (in list slices, range and so on).

  2. I would replace elif 60 <= i <= 79: by else: to make it clear that this must match.

  3. You can avoid the need for temp by using a tuple assignment:

     a, b, c, d, e = ((rol(a, 5) + f + e + k + w[i]) & 0xffffffff, a, rol(b, 30), c, d)
    
  1. There's no documentation. What does this code do and how am I supposed to use it?

  2. There are no test cases. Since the hashlib module has a sha1 function, it would be easy to write a test case that generates some random data and hashes it with both algorithms. For example:

     import hashlib
     from random import randrange
     from unittest import TestCase
    
     class TestSha1(TestCase):
         def test_sha1(self):
             for l in range(129):
                 data = bytearray(randrange(256) for _ in range(l))
                 self.assertEqual(hashlib.sha1(data).hexdigest(), sha1(data))
    
  3. You start by converting the message to a string containing its representation in binary. But then later on you convert it back to integers again. It would be simpler and faster to work with bytes throughout. Python provides the bytearray type for manipulating sequences of bytes and the struct module for interpreting sequences of bytes as binary data. So you could initialize and pad the message like this:

     message = bytearray(data)
     # Append a 1-bit.
     message.append(0x80)
     # Pad with zeroes until message length in bytes is 56 (mod 64).
     message.extend([0] * (63 - (len(message) + 7) % 64))
     # Append the original length (big-endian, 64 bits).
     message.extend(struct.pack('>Q', len(data) * 8))
    

and then convert the chunks to 32-bit integers using struct.unpack like this:

    for chunk in range(0, len(message), 64):
        w = list(struct.unpack('>16L', message[chunk: chunk+64]))
        for i in range(16, 80):
            w.append(rol((w[i-3] ^ w[i-8] ^ w[i-14] ^ w[i-16]), 1))
  1. I would write the conditions in the main loop as 0 <= i < 20, 20 <= i < 40 and so on, to match the half-open intervals that Python uses elsewhere (in list slices, range and so on).

  2. I would replace:

     elif 60 <= i <= 79:
    

by

    else:
        assert 60 <= i < 80

to make it clear that this case must match.

  1. You can avoid the need for temp by using a tuple assignment:

     a, b, c, d, e = ((rol(a, 5) + f + e + k + w[i]) & 0xffffffff, a, rol(b, 30), c, d)
    
use b'' for bytes
Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 211
  1. There's no documentation. What does this code do and how am I supposed to use it?

  2. There are no test cases. Since the hashlib module has a sha1 function, it would be easy to write a test case that generates some random data and hashes it with both algorithms. For example:

     import hashlib
     from random import randrange
     from unittest import TestCase
    
     class TestSha1(TestCase):
         def test_sha1(self):
             for l in range(129):
                 data = ''.join(chr(randrange(256)) for _ in range(l))
                 self.assertEqual(hashlib.sha1(data).hexdigest(), sha1(data))
    
  3. You start by converting the message to a string containing its representation in binary. But then later on you convert it back to integers again. It would be simpler and faster to work with bytes throughout. Python provides the bytearray type for manipulating sequences of bytes and the struct module for interpreting sequences of bytes as binary data. So you could initialize and pad the message like this:

     message = bytearray(data)
     # Append a 1-bit.
     message.append(0x80b'\x80')
     # Pad with zeroes until message length in bytes is 56 (mod 64).
     message.extend(0 for _b'\x00' in* range(63 - (len(message) + 7) % 64))
     # Append the original length (big-endian, 64 bits).
     message.extend(struct.pack('>Q', len(data) * 8))
    

and then convert the chunks to 32-bit integers like this:

    for chunk in range(0, len(message), 64):
        w = list(struct.unpack('>16L', message[chunk: chunk+64]))

This runs substantially faster than your code.

  1. I would write the conditions in the main loop as 0 <= i < 20, 20 <= i < 40 and so on, to match the half-open intervals that Python uses elsewhere (in list slices, range and so on).

  2. I would replace elif 60 <= i <= 79: by else: to make it clear that this must match.

  3. You can avoid the need for temp by using a tuple assignment:

     a, b, c, d, e = ((rol(a, 5) + f + e + k + w[i]) & 0xffffffff, a, rol(b, 30), c, d)
    
  1. There's no documentation. What does this code do and how am I supposed to use it?

  2. There are no test cases. Since the hashlib module has a sha1 function, it would be easy to write a test case that generates some random data and hashes it with both algorithms. For example:

     import hashlib
     from random import randrange
     from unittest import TestCase
    
     class TestSha1(TestCase):
         def test_sha1(self):
             for l in range(129):
                 data = ''.join(chr(randrange(256)) for _ in range(l))
                 self.assertEqual(hashlib.sha1(data).hexdigest(), sha1(data))
    
  3. You start by converting the message to a string containing its representation in binary. But then later on you convert it back to integers again. It would be simpler and faster to work with bytes throughout. Python provides the bytearray type for manipulating sequences of bytes and the struct module for interpreting sequences of bytes as binary data. So you could initialize and pad the message like this:

     message = bytearray(data)
     # Append a 1-bit.
     message.append(0x80)
     # Pad with zeroes until message length in bytes is 56 (mod 64).
     message.extend(0 for _ in range(63 - (len(message) + 7) % 64))
     # Append the original length (big-endian, 64 bits).
     message.extend(struct.pack('>Q', len(data) * 8))
    

and then convert the chunks to 32-bit integers like this:

    for chunk in range(0, len(message), 64):
        w = list(struct.unpack('>16L', message[chunk: chunk+64]))

This runs substantially faster than your code.

  1. I would write the conditions in the main loop as 0 <= i < 20, 20 <= i < 40 and so on, to match the half-open intervals that Python uses elsewhere (in list slices, range and so on).

  2. I would replace elif 60 <= i <= 79: by else: to make it clear that this must match.

  3. You can avoid the need for temp by using a tuple assignment:

     a, b, c, d, e = ((rol(a, 5) + f + e + k + w[i]) & 0xffffffff, a, rol(b, 30), c, d)
    
  1. There's no documentation. What does this code do and how am I supposed to use it?

  2. There are no test cases. Since the hashlib module has a sha1 function, it would be easy to write a test case that generates some random data and hashes it with both algorithms. For example:

     import hashlib
     from random import randrange
     from unittest import TestCase
    
     class TestSha1(TestCase):
         def test_sha1(self):
             for l in range(129):
                 data = ''.join(chr(randrange(256)) for _ in range(l))
                 self.assertEqual(hashlib.sha1(data).hexdigest(), sha1(data))
    
  3. You start by converting the message to a string containing its representation in binary. But then later on you convert it back to integers again. It would be simpler and faster to work with bytes throughout. Python provides the bytearray type for manipulating sequences of bytes and the struct module for interpreting sequences of bytes as binary data. So you could initialize and pad the message like this:

     message = bytearray(data)
     # Append a 1-bit.
     message.append(b'\x80')
     # Pad with zeroes until message length in bytes is 56 (mod 64).
     message.extend(b'\x00' * (63 - (len(message) + 7) % 64))
     # Append the original length (big-endian, 64 bits).
     message.extend(struct.pack('>Q', len(data) * 8))
    

and then convert the chunks to 32-bit integers like this:

    for chunk in range(0, len(message), 64):
        w = list(struct.unpack('>16L', message[chunk: chunk+64]))

This runs substantially faster than your code.

  1. I would write the conditions in the main loop as 0 <= i < 20, 20 <= i < 40 and so on, to match the half-open intervals that Python uses elsewhere (in list slices, range and so on).

  2. I would replace elif 60 <= i <= 79: by else: to make it clear that this must match.

  3. You can avoid the need for temp by using a tuple assignment:

     a, b, c, d, e = ((rol(a, 5) + f + e + k + w[i]) & 0xffffffff, a, rol(b, 30), c, d)
    
Source Link
Gareth Rees
  • 50.1k
  • 3
  • 130
  • 211

  1. There's no documentation. What does this code do and how am I supposed to use it?

  2. There are no test cases. Since the hashlib module has a sha1 function, it would be easy to write a test case that generates some random data and hashes it with both algorithms. For example:

     import hashlib
     from random import randrange
     from unittest import TestCase
    
     class TestSha1(TestCase):
         def test_sha1(self):
             for l in range(129):
                 data = ''.join(chr(randrange(256)) for _ in range(l))
                 self.assertEqual(hashlib.sha1(data).hexdigest(), sha1(data))
    
  3. You start by converting the message to a string containing its representation in binary. But then later on you convert it back to integers again. It would be simpler and faster to work with bytes throughout. Python provides the bytearray type for manipulating sequences of bytes and the struct module for interpreting sequences of bytes as binary data. So you could initialize and pad the message like this:

     message = bytearray(data)
     # Append a 1-bit.
     message.append(0x80)
     # Pad with zeroes until message length in bytes is 56 (mod 64).
     message.extend(0 for _ in range(63 - (len(message) + 7) % 64))
     # Append the original length (big-endian, 64 bits).
     message.extend(struct.pack('>Q', len(data) * 8))
    

and then convert the chunks to 32-bit integers like this:

    for chunk in range(0, len(message), 64):
        w = list(struct.unpack('>16L', message[chunk: chunk+64]))

This runs substantially faster than your code.

  1. I would write the conditions in the main loop as 0 <= i < 20, 20 <= i < 40 and so on, to match the half-open intervals that Python uses elsewhere (in list slices, range and so on).

  2. I would replace elif 60 <= i <= 79: by else: to make it clear that this must match.

  3. You can avoid the need for temp by using a tuple assignment:

     a, b, c, d, e = ((rol(a, 5) + f + e + k + w[i]) & 0xffffffff, a, rol(b, 30), c, d)