14

In python, I have been given a 64 bit integer. This Integer was created by taking several different 8 bit integers and mashing them together into one giant 64 bit integer. It is my job to separate them again.

For example:

Source number: 2592701575664680400
Binary (64 bits): 0010001111111011001000000101100010101010000101101011111000000000
int 1: 00100011 (35)
int 2: 11111011 (251)
int 3: 00100000 (32)
int 4: 01011000 (88)
int 5: 10101010 (170)
int 6: 00010110 (22)
int 7: 10111110 (190)
int 8: 00000000 (0)

So what I would like to do is take my source number 2592701575664680373 and return an array of length 8, where each int in the array are the ints listed above.

I was going to use struct, but to be perfectly honest, reading the documentation hasn't made it quite clear exactly how I would accomplish that.

5
  • Have you tried divmod()? Commented Sep 9, 2015 at 22:33
  • Dang it, you are correct @PadraicCunningham. I was using a quick a dirty tool that didn't support numbers large enough, and it trucnated that last part with 0's. Now that i have run bin = '{0:064b}'.format(source) i see that you are correct. Commented Sep 9, 2015 at 22:39
  • The fact n is odd and there was no 1 at the end did have me confused Commented Sep 9, 2015 at 22:41
  • I changed the source number (seemed better than changing the binary, given the answers) Commented Sep 9, 2015 at 22:42
  • The struct documentation isn't all that bad, you just have to load your number into a string before you split it out into bytes. Commented Sep 10, 2015 at 20:16

5 Answers 5

10

Solution

Solution without converting the number to a string:

x = 0b0010001111111011001000000101100010101010000101101011111000000000

numbers = list((x >> i) & 0xFF for i in range(0,64,8))
print(numbers)                    # [0, 190, 22, 170, 88, 32, 251, 35]
print(list(reversed(numbers)))    # [35, 251, 32, 88, 170, 22, 190, 0]

Explanation

Here I used list comprehensions, making a loop in increments of 8 over i. So i takes the values 0, 8, 16, 24, 32, 40, 48, 56. Every time, the bitshift operator >> temporarily shifts the number x down by i bits. This is equivalent to dividing by 256^i.

So the resulting number is:

i = 0:   0010001111111011001000000101100010101010000101101011111000000000
i = 8:           00100011111110110010000001011000101010100001011010111110
i = 16:                  001000111111101100100000010110001010101000010110
i = 24:                          0010001111111011001000000101100010101010
i = 32:                                  00100011111110110010000001011000
i = 40:                                          001000111111101100100000
i = 48:                                                  0010001111111011
i = 56:                                                          00100011

By usig & 0xFF, I select the last 8 bits of this number. Example:

x >> 48:           001000111111101100100000
0xff:                              11111111
(x >> 48) & 0xff:  000000000000000000100000

Since the leading zeros do not matter, you have the desired number.

The result is converted to a list and printed in normal and reversed order (like OP wanted it).

Performance

I compared the timing of this result to the other solutions proposed in this thread:

In: timeit list(reversed([(x >> i) & 0xFF for i in range(0,64,8)]))
100000 loops, best of 3: 13.9 µs per loop

In: timeit [(x >> (i * 8)) & 0xFF for i in range(7, -1, -1)]
100000 loops, best of 3: 11.1 µs per loop

In: timeit [(x >> i) & 0xFF for i in range(63,-1,-8)]
100000 loops, best of 3: 10.2 µs per loop

In: timeit reversed(struct.unpack('8B', struct.pack('Q', x)))
100000 loops, best of 3: 3.22 µs per loop

In: timeit reversed(struct.pack('Q', x))
100000 loops, best of 3: 2.07 µs per loop

Result: my solution is not the fastest! Currently, using struct directly (as proposed by Mark Ransom) seems to be the fastest snippet.

Sign up to request clarification or add additional context in comments.

5 Comments

You can also [(n >> (i * 8)) & 0xFF for i in range(7, -1, -1)] and forget reversing
For some reason, I get different timing results. I am using iPython 2.0.0 on Python 3.4.2, 32 bit. On a 64-bit Windows machine.
There is a simple answer, you are doing nothing in the first code, you have a generator expression inside the list
You would still have to reverse the list, [(x >> (i * 8)) & 0xFF for i in range(7, -1, -1)] gives you the output in the correct order
Using struct is about three times faster.
9

In Python 2.x, struct.pack returns a string of bytes. It's easy to convert that to an array of integers.

>>> bytestr = struct.pack('>Q', 2592701575664680400)
>>> bytestr
'#\xfb X\xaa\x16\xbd\xd0'
>>> [ord(b) for b in bytestr]
[35, 251, 32, 88, 170, 22, 189, 208]

The struct module in python is used for converting from python object to byte strings, typically packed according to C structure packing rules. struct.pack takes a format specifier (a string which describes how the bytes of the structure should be laid out), and some python data, and packs it into a byte string. struct.unpack does the inverse, taking a format specifier and a byte string and returning a tuple of unpacked data once again in the format of python objects.

The format specifier being used has two parts. The lead character specifies the endianness (byte order) of the string. The following characters specify the types of the fields of the struct being packed or unpacked. So '>Q' means to pack the given data as a big-endian unsigned long long. To get the bytes in the opposite order, you could use < instead for little-endian.

The final operation is a list comprehension which iterates over the characters of the byte string and uses the ord builtin function to get the integer representation of that character.

Final note: Python doesn't actually have a concept of integer size. In 2.x, there is int which is limited to 32 bits, and long which is of unlimited size. In 3.x those two were unified into a single type. So even though this operation guarantees to give integers that take up only one byte, noting about python will force the resulting integers to stay that way if you use them in other operations.

2 Comments

Thank you so much for the explanation! Not only does this solve my problem, but I feel much more confident in my ability to use the struct module from now on.
@JHixson you can thank zstewart who added the entire explanation after I answered with the code.
2
bn = "0010001111111011001000000101100010101010000101101011111000000000"

print([int(bn[i:i+8], 2) for i in range(0,len(bn), 8)])
[35, 251, 32, 88, 170, 22, 190, 0]

If you are using the binary representation of n then the output would be different:

n = 2592701575664680373
bn = bin(n)

print([int(bn[i:i+8], 2) for i in range(0,len(bn), 8)])
[35, 251, 32, 88, 170, 22, 189, 181]

Some timings:

In [16]: %%timeit                                                
numbers = list((n >> i) & 0xFF for i in range(0,64,8))
list(reversed(numbers))
   ....: 
100000 loops, best of 3: 2.97 µs per loop

In [17]: timeit [(n >> (i * 8)) & 0xFF for i in range(7, -1, -1)]
1000000 loops, best of 3: 1.73 µs per loop

In [18]: %%timeit                                                
bn = bin(n)
[int(bn[i:i+8], 2) for i in range(0,len(bn), 8)]
   ....: 
100000 loops, best of 3: 3.96 µs per loop

You can also just divmod:

out = []
for _ in range(8):
    n, i = divmod(n, 256)
    out.append(i) 
out = out[::-1]

Which is almost as efficient:

In [31]: %%timeit
   ....: n = 2592701575664680411
   ....: out = []
   ....: for _ in range(8):
   ....:     n, i = divmod(n, 1 << 8)
   ....:     out.append(i)
   ....: out[::-1]
   ....: 
100000 loops, best of 3: 2.35 µs per loop

There is very little advantage in bit shifting with python, I would be more inclined to use whatever you and others find more readable.

Comments

2

Here's a version using struct:

import struct
n = 2592701575664680400
bytes = struct.unpack('8B', struct.pack('Q', n))

The bytes are returned in the opposite order that you showed in your question.

Here are the performance stats:

python -m timeit -s "import struct" "struct.unpack('8B', struct.pack('Q', 2592701575664680400))"
1000000 loops, best of 3: 0.33 usec per loop

On my computer, this is three times faster than the byte-shifting solution.

1 Comment

You can probably control the order the bytes are returned by specifying a byte order for the 64-bit integer (e.g. big-endian, with >).
0

This seems faster for a bunch of unit64. Uses numpy.

from cytpes import *
import numpy as np
l1 = c_uint64 * 512
payload64 = l1(0)
payload8 = np.frombuffer(payload64, dtype=np.uint8)

Where payload8 is an array of np.unit8 afterwards 8 times the size of payload64 and has the converterd bytes in it.

For me it is faster than the struct variant...

for i in range(len(payload64)):
       payload8[i*8:i*8+8] = struct.unpack('8B', struct.pack('Q', payload64[i]))

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.